<?xml version="1.0" encoding="utf-8"?><!DOCTYPE article  PUBLIC '-//OASIS//DTD DocBook XML V4.4//EN'  'http://www.docbook.org/xml/4.4/docbookx.dtd'><article><articleinfo><title>Automatic detection and correction of annotation errors in Polish language corpora</title><revhistory><revision><revnumber>7</revnumber><date>2013-09-16 09:25:10</date><authorinitials>LukaszKobylinski</authorinitials></revision><revision><revnumber>6</revnumber><date>2013-09-16 09:21:21</date><authorinitials>MichalLenart</authorinitials></revision><revision><revnumber>5</revnumber><date>2013-09-16 09:21:01</date><authorinitials>MichalLenart</authorinitials></revision><revision><revnumber>4</revnumber><date>2013-09-16 09:20:09</date><authorinitials>MichalLenart</authorinitials></revision><revision><revnumber>3</revnumber><date>2012-02-23 14:41:48</date><authorinitials>LukaszKobylinski</authorinitials></revision><revision><revnumber>2</revnumber><date>2012-02-13 14:37:44</date><authorinitials>LukaszKobylinski</authorinitials></revision><revision><revnumber>1</revnumber><date>2012-02-13 14:37:33</date><authorinitials>LukaszKobylinski</authorinitials></revision></revhistory></articleinfo><section><title>Automatic detection and correction of annotation errors in Polish language corpora</title><section><title>Project factsheet</title><informaltable><tgroup cols="2"><colspec colname="col_0"/><colspec colname="col_1"/><tbody><row rowsep="1"><entry colsep="1" rowsep="1"><para> English name:         </para></entry><entry colsep="1" rowsep="1"><para> Automatic detection and correction of annotation errors in Polish language corpora </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Polish name:          </para></entry><entry colsep="1" rowsep="1"><para> Automatyczne wykrywanie i korekcja błędów anotacyjnych w polskich korpusach językowych </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Project type:         </para></entry><entry colsep="1" rowsep="1"><para> A <ulink url="http://www.ncn.gov.pl/?language=en">National Science Centre</ulink> research grant (number 2011/01/N/ST6/01107) </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Duration:             </para></entry><entry colsep="1" rowsep="1"><para> 21 December 2011 ‒ 20 December 2013 </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Principal investigator: </para></entry><entry colsep="1" rowsep="1"><para> Łukasz Kobyliński </para></entry></row><row rowsep="1"><entry colsep="1" rowsep="1"><para> Institution:          </para></entry><entry colsep="1" rowsep="1"><para> <ulink url="http://zil.ipipan.waw.pl/">Institute of Computer Science, Polish Academy of Sciences</ulink> </para></entry></row></tbody></tgroup></informaltable></section><section><title>Project summary</title><para>The main goals of the project are as follows: to improve the already known methods of automated detection of annotation errors in text corpora (on the morpho-syntactic level), to develop an accurate method of such error detection for Polish language resources and to provide an efficient tool, which may be used to automatically correct tagging errors in English and Polish corpora. </para><para>The quality of the low-level (morpho-syntactic) corpus annotation is crucial, as the annotation is used to train automated taggers themselves. Often a gold-standard subcorpus is selected from a larger collection of documents and it serves as the training material for taggers, which are then used to annotate the complete corpus. Precision of annotation in such a subcorpus influences the tagging quality of the entire corpus and thus has a direct impact on the accuracy of other, higher levels of text processing, e.g. semantic layers of annotation. </para></section></section></article>